Hierarchical Clustering Algorithm for Multi-Camera Vehicle Trajectories Based on Spatio-Temporal Grouping under Intelligent Transportation and Smart City

With the emergence of intelligent transportation and smart city system, the issue of how to perform an efficient and reasonable clustering analysis of the mass vehicle trajectories on multi-camera monitoring videos through computer vision has become a significant area of research. The traditional trajectory clustering algorithm does not consider camera position and field of view and neglects the hierarchical relation of the video object motion between the camera and the scenario, leading to poor multi-camera video object trajectory clustering. To address this challenge, this paper proposed a hierarchical clustering algorithm for multi-camera vehicle trajectories based on spatio-temporal grouping. First, we supervised clustered vehicle trajectories in the camera group according to the optimal point correspondence rule for unequal-length trajectories. Then, we extracted the starting and ending points of the video object under each group, hierarchized the trajectory according to the number of cross-camera groups, and supervised clustered the subsegment sets of different hierarchies. This method takes into account the spatial relationship between the camera and video scenario, which is not considered by traditional algorithms. The effectiveness of this approach has been proved through experiments comparing silhouette coefficient and CPU time.


Introduction
Vehicle trajectory data analysis is a research hotspot in intelligent transportation and smart city [1]. People can deeply understand their life trajectory, social behavior, environmental change, and urban evolution using video GIS, video object recognition, and visual analysis of video object trajectory [2,3]. In recent years, the urban video monitoring system has gradually developed from single-camera processing to multi-camera equipment joint analysis. The system generates a large number of trajectory data under the multi-camera joint monitoring system and records the movement of people, vehicles, and animals in various scenarios [4], with such characteristics as easy deployment, intuitive information, and rich media expression. Dividing each vehicle trajectory into a suitable cluster by measuring its similarity is a crucial and challenging aspect of trajectory analysis [5]. However, there are some areas for improvement in current clustering methods for video object trajectory research. Specifically, the analysis object is limited to image trajectory [6], ignoring the actual trajectory of the video object in geographical space. Additionally, the traditional trajectory clustering algorithm does not consider the geographic space relation of multicameras and only clusters datasets in a single camera and a single scenario, resulting in poor real-time performance and accuracy in large-scale cross-camera or even cross-camera group trajectory data.
Clustering large-scale trajectory data is challenging. First, the trajectory is complex, so the computational cost for clustering analysis is high. Second, the trajectories sampled by the multi-camera joint monitoring system are not equidistant, making it difficult to unify the dimensions between different video object trajectories, so reasonable point correspondence rules should be given to measure the difference in distance calculation. Finally, there is a vision-blind area and a long distance between camera groups, so the direct calculation without simplifying trajectories causes the local differences between trajectories to be overlooked due to the large global scope. The traditional algorithms only cluster the equidistant trajectories of sampling points under a single camera for small-scale datasets without considering the hierarchical relation between camera position, the field of view, and the motion of video objects between cameras and scenarios and without reasonable analysis of cross-group trajectory data. Therefore, this paper proposed a hierarchical clustering algorithm for multi-camera vehicle trajectories based on spatio-temporal grouping. Based on our previous work, we have done further research in this paper [7]. In this paper, the camera groups are divided according to the spatial distribution law of camera equipment in reality, which effectively makes up for the deficiency of traditional methods that do not take into account the spatial relationship between camera equipment. The point correspondence rule of optimal unequal length trajectories and the overlapping scale factor of trajectory distance under camera-joint system are proposed to effectively calculate the distance between unequal multi-camera trajectories and effectively reduce the computational cost compared with the traditional methods. In addition, this paper also proposes a method to automatically obtain labels of video object trajectories in the camera group and across camera groups, which can achieve better clustering results through additional useful supervision information.
This paper is mainly structured as follows: Section 2 introduces the relevant work in the research field involved; Section 3 presents the method description, including the spatialization of video object trajectory, scenarios and division rule of camera groups, point correspondence rule of optimal unequal length trajectory, overlapping scale factor of trajectory distance under camera-joint system, and the details of the supervised video object clustering algorithm within and between camera groups; Section 4 demonstrates the visualization effect of the algorithm on the dataset and evaluates the accuracy and efficiency of the method through experiments; Section 5 discusses the implications of this work, as well as the limitations of this work and how it will be addressed in future works; Section 6 presents the conclusions and prospects.

Related Work
The research fields involved in this paper include video-geographic scenario data fusion organization and trajectory clustering. The research status of relevant fields is as follows.

Video-Geographic Scenario Data Fusion Organization
The data fusion organization of video-geographic scenarios is the basis of video object analysis combining geospatial information. Based on the concepts of Multi Media GIS [8], Geo Video [9], and Video GIS [10], the metadata description method [11] and GPS correlation method [12] were constructed in the earlier research, and geographic retrieval and play of video images were realized by describing the geographic location of video frames.
In recent years, research has focused more on the fusion of video content and geographic scenarios [13]. The fusion of video content and geographic scene can be subdivided into the method of strengthening video through geographic scene and the method of strengthening geographic scene by video [14]. The output result of the former is enhanced video, and the video content is visually enhanced by using the spatial information in the geographic scene [15], while the latter is constructed and presented through the geographic scene [16]. According to the different mapping objects, it can be divided into video frame picture projection and video object projection. The video frame picture projection maps the video frame image to the geographic scene according to the camera parameters. The video object projection realizes the separation of the front and rear scenes through the related theory of computer vision and projects into the geographical scene. The picture projection of video frame includes the methods of correlation [17], fixed plane play [18], global mapping [19], and so on. The video object projection can be divided into the methods of foreground and background independent projection [20], foreground projection [21], and foreground abstraction [22].

Trajectory Clustering
Common in pattern recognition, data analysis, and machine learning, trajectory clustering is an efficient method for analyzing trajectory data, aiming to obtain space, time, and even potential information in trajectory data. It is widely applied in such fields as object motion prediction [23], traffic monitoring [24], behavior understanding [25], anomaly detection [26], and 3D reconstruction [27]. In addition, data representation, feature extraction, and distance measurement selection are the key preliminary work of trajectory clustering. For example, a trajectory can be represented as a vector and downsampled to a uniform length, so the Euclidean distance [28] can be used. Trajectories can also be considered as samples of the probability distribution, so Bhattacharyya distance [29] can be used to measure the distance between two trajectories.
According to the availability of labeled data, there are three trajectory clustering methods: unsupervised, supervised, and semi-supervised models. The unsupervised model aims to cluster the data without human experts supervising or labeling the data and obtain the reasoning function by analyzing the unlabeled dataset [30]. The supervised model is learned before trajectory clustering. The labeled data are usually used to learn the function that maps data to its label, the function that predicts unlabeled data clustering [31]. The labeled data often require heavy manual work by experts. The semi-supervised algorithm is trained by labeled data and adjusted by unlabeled data [32].
The unsupervised trajectory clustering algorithm does not rely on other prior knowledge but uses constructors to describe the implicit relationship between trajectory samples. The representative methods are densely clustering models [33], hierarchical clustering models [34], and spectral clustering models [35]. By directly defining the density-reachable rule, densely clustering models can obtain the maximum set of density connected samples as cluster clusters. Hierarchical clustering models are divided into "top-down [36]" type and "bottom-up [37]" type. The former regards each trajectory sample as a cluster and gradually merges into larger clusters by defining the similarity between clusters; the latter gradually divides the trajectory data set into smaller groups by using the idea of divide and conquer until it meets the requirements of clustering. Supervised trajectory clustering algorithm is represented by the K-NN algorithm [38] and statistical model method [39]. The K-NN algorithm uses the average nearest distance as the evaluation criterion to realize trajectory clustering. The statistical model method uses the Gaussian mixture model [40] and other statistical models to construct the trajectory category probability function to obtain the trajectory sample category. In addition to the above methods, in recent years, with the continuous extension of deep learning, there are a large number of deep learning algorithms [41,42]. Semi-supervised trajectory clustering algorithm [43,44] fully combines supervised clustering algorithm with unsupervised clustering algorithm to reduce the label labor cost and minimize the over-fitting problem in the clustering process. The basic idea is to update the classifier by classifying the trajectory data and clustering the new data.
In order to understand the three kinds of algorithms more clearly, we give the tabular overview of different trajectory clustering algorithms in Table 1.

Spatialization of Video Object Trajectory
Building the mapping relation between each camera image square plane and geographical space object square plane is necessary to transform from image space coordinates to geographical space coordinates. This paper used the contact point between the video object subgraph and the ground as the locating point and sampled at a certain time interval [45]. The video object trajectory in the geographic scenario was obtained from a building mapping relation between image space and geographic space [46], as shown in Figure 1.

Spatialization of Video Object Trajectory
Building the mapping relation between each camera image square plane and geographical space object square plane is necessary to transform from image space coordinates to geographical space coordinates. This paper used the contact point between the video object subgraph and the ground as the locating point and sampled at a certain time interval [45]. The video object trajectory in the geographic scenario was obtained from a building mapping relation between image space and geographic space [46], as shown in Figure 1. This study uses the homography matrix method to construct a mapping model. If the image coordinate of a point is , and the geographic space coordinate is , then the homogeneous coordinate of and can be represented as If the mapping matrix is , the relational expression between and is = .
(3) This study uses the homography matrix method to construct a mapping model. If the image coordinate of a point is q, and the geographic space coordinate is Q, then the homogeneous coordinate of q and Q can be represented as If the mapping matrix is M, the relational expression between q and Q is Sensors 2023, 23, 6909

of 19
The image plane is scaled, translated, and rotated to a geospatial plane, so the mapping matrix M can be decomposed into where s is the scale factor; W is the camera translation transformation matrix; R is a 3 × 4 dimensional rotation transformation matrix.
where f u and f v represent the product of the physical focal length of the lens and the sensor size in the horizontal and vertical axis directions of each unit, respectively; u and v represent the offset of the imaging center relative to the principal optic axis on the horizontal and vertical axis, respectively; r 1 , r 2 , and r 3 represent the rotation relation of the coordinate system in the X-axis, Y-axis, and Z-axis in the physical space, respectively; e is the translation relation between coordinate systems. When using the homography matrix method, it is assumed that the camera field of view plane in geographic space is horizontal-that is, Z = 0 at the plane. Thus, the mapping relation between image and geographic space can be regarded as the mapping from one plane to another. To simplify the calculation, the Z in Q and the r 3 rotated about the Z-axis in R are removed. Hence, the homography matrix M is simplified as The geospatial coordinates of the video object trajectory can be obtained according to the solution of the matrix M.

Scenarios and Division Rule of Camera Groups
The closely adjacent location cameras within the group and the overlapped camera field of view are preferred when cameras are divided into groups. The weigh options rules between the two can be specified as where C represents all cameras to be divided, and P(h) represents the nearest neighbor in a position, which can be stated as the reciprocal of the distance between positions: P(D|h) is the degree of view overlap, expressed by the area of view overlap: ε represents the normalized constant parameter.

Point Correspondence Rule of Optimal Unequal Length Trajectories
Inspired by the dynamic time warping [47], to better compare the distances between two trajectories with different lengths, it is necessary to establish one-to-many and many- to-one matching so that the two trajectories have the same pattern of wave troughs and peaks perfectly matched. Two unequal-length trajectory sequences are known: Tra x = p x,j (j = 1, · · · , len x ) , Tra y = p y,j j = 1, · · · , len y , len x = len y .
The trajectory sequence may not have equidistant time points. A feature space represented by F is fixed. Its local distance measure is defined as a function: The defined sequence is the point correspondence sequence of optimal unequallength trajectory: Seq(S) = s 1 , s 2 , · · · , s e , · · · , s len(Seq(S)) . (13) Then, the sequence Seq(S) satisfies The distance cost between Tra x and Tra y is written as c Tra x n l , Tra y m l .
The optimal total cost between Tra x and Tra y is shown as the minimum value of c p Tra x , Tra y : min c p Tra x , Tra y .

Overlapping Scale Factor of Trajectory Distance under Camera-Joint System
This part explains the increase in distance calculation due to the time overlap of multiple cameras in the camera-joint system. Then it gives a method to obtain the scale factor of trajectory distance overlap to offset the increase in distance.
Assuming G as a camera group, According to the capture sequence, the trajectory corresponding to the video object can be expressed as a series of nodes: where which represents that the video object is captured by Cam i,j at this node. Assuming the two trajectories Tra 1 and Tra 2 under the camera group formed by cameras Cam 1 and Cam 2 are shown in Figure 2. In , the total duration under two cameras is , which can be divided into , , , in which , are captured by only one camera while is captured by two cameras at the same time; is only captured by during time . If the distance between the two is calculated directly, the density of trajectory nodes will increase as cameras overlap in time of , leading to an increase in the distance value. Therefore, is obtained first to eliminate the distance of the added part. The total global duration of is = , = , respectively, while the actual calculation duration is = + , = . Therefore, the scale factor of overlapping trajectories of and can be obtained as follows: This factor should be multiplied in the distance calculation to eliminate the increase in distance due to camera overlap.
This part details the trajectory clustering algorithm within the camera group (SCAIMG) and between camera groups (SCAIBG). The process of the algorithm is shown in Figure 3. In Tra 1 , the total duration under two cameras is t 1 , which can be divided into t 2 , t 3 , t 4 , in which t 3 , t 4 are captured by only one camera while t 2 is captured by two cameras at the same time; Tra 1 is only captured by Cam 1 during time t 5 . If the distance between the two is calculated directly, the density of trajectory nodes will increase as cameras overlap in time t 2 of Tra 1 , leading to an increase in the distance value. Therefore, Tra 1 is obtained first to eliminate the distance of the added part. The total global duration of Tra 2 is α 1 = t 1 , α 2 = t 5 , respectively, while the actual calculation duration is β 1 = t 1 + t 2 , β 2 = t 5 . Therefore, the scale factor of overlapping trajectories of Tra 1 and Tra 2 can be obtained as follows: This factor should be multiplied in the distance calculation to eliminate the increase in distance due to camera overlap.
This part details the trajectory clustering algorithm within the camera group (SCAIMG) and between camera groups (SCAIBG). The process of the algorithm is shown in Figure 3.

Supervised Clustering Label Acquisition
Supervised clustering results can be obtained with more additional useful supervised information than unsupervised clustering. For the video object trajectory under the camera joint system in this chapter, the "main camera" of video objects is defined as the label for the supervised clustering of trajectory samples. For the trajectory between camera groups, the number sequence of the camera group passed through is used as the label of the supervised clustering of the trajectory sample.
In the camera group of the camera-joint system, the camera number with the most capture times should be the same among video objects with the same characteristics. Therefore, for accurate trajectory data clustering, we use the camera number with the highest capture times corresponding to each video object as the label of supervised clustering: Furthermore, we used the camera group number as the label of supervised clustering between camera groups. This factor should be multiplied in the distance calculation to eliminate the increase in distance due to camera overlap.
This part details the trajectory clustering algorithm within the camera group (SCAIMG) and between camera groups (SCAIBG). The process of the algorithm is shown in Figure 3.

Trajectory Clustering Algorithm within the Camera Group
The trajectory clustering algorithm within the camera group aims to obtain a group of clustering centers. First, the video objects in the camera group are spatialized, followed by clustering their trajectory. The pseudo-code of trajectory clustering within the group (SCAIMG) is shown in Algorithm 1. For each sample s from φ: 8 : calculate the distance between s to each center c τ : d(s, c τ ) ← min c p (s, c τ ) 9 : In Algorithm 1, line 11 and line 13 represent the operations of "approaching" and "moving away", respectively. The schematic diagram of the approaching operation is shown in Figure 4. It is assumed that label(s) == label(c τ ) is known, and every node in the trajectory s moves towards c τ .
The trajectory of the "moving away" operation moves in the opposite direction in Figure 4.  The trajectory of the "moving away" operation moves in the opposite direction in Figure 4.

Trajectory Clustering Algorithm between Groups
We can combine the entry and departure points in each camera field of view of video object trajectories to reflect the dynamic characteristics of group trajectories [48,49] because the range of trajectories between groups is larger than that within a group. The entry point and departure point of the vehicle video object under each camera group are taken as the trajectory sampling points to form multiple trajectory subsegments, which are then hierarchically clustered.
The sampled trajectory is supposed as: , = , , , , , , , , where , represents the entry point or departure point information, and , represents the -th camera group number that video passes through. , , , represent the coordinates, and , represents the timestamp. Figure 5 shows the trajectory , , in camera 1-3 . passes through 1 , 2 , and 3 , with five hierarchies; passes through 1 and 2 , with three hierarchies; passes through 3 , with one hierarchy. , contain trajectory subsegments of three hierarchies starting at 1 and ending at 2; , contain trajectory subsegments in 3, with one hierarcy. We proposed a trajectory clustering algorithm between camera groups with multihierarchies to reasonably analyze the cross-camera group trajectories. The algorithm aims to obtain the sets of cluster centers: where (2ℎ − 1) = , , , … , , , … , , , ℎ, ∈ ℕ, , , where refers to the number of cross-camera groups of trajectories within a group, (2ℎ − 1) refers to the cluster center set when the number of cross-camera groups of trajectories within groups is 2ℎ − 1, and refers to the -th cluster center.

Trajectory Clustering Algorithm between Groups
We can combine the entry and departure points in each camera field of view of video object trajectories to reflect the dynamic characteristics of group trajectories [48,49] because the range of trajectories between groups is larger than that within a group. The entry point and departure point of the vehicle video object under each camera group are taken as the trajectory sampling points to form multiple trajectory subsegments, which are then hierarchically clustered.
The sampled trajectory is supposed as: where p i,j represents the entry point or departure point information, and Group i,j represents the j-th camera group number that video f i passes through. x i,j , y i,j represent the coordinates, and t i,j represents the timestamp. Figure 5 shows the trajectory f 1 , f 2 , f 3 in camera Groups 1-3. f 1 passes through Group 1, Group 2, and Group 3, with five hierarchies; f 2 passes through Group 1 and Group 2, with three hierarchies; f 3 passes through Group 3, with one hierarchy. f 1 , f 2 contain trajectory subsegments of three hierarchies starting at Group 1 and ending at Group 2; f 1 , f 3 contain trajectory subsegments in Group 3, with one hierarcy. The pseudo-code of Supervised trajectory clustering algorithm considering camera information between groups (SCAIBG) is shown in Algorithm 2. We proposed a trajectory clustering algorithm between camera groups with multihierarchies to reasonably analyze the cross-camera group trajectories. The algorithm aims to obtain the λ sets of cluster centers: where C(2h − 1) = c 2h−1,1 , c 2h−1,2 . . . , c 2h−1,k , . . . , c 2h−1,l , h, k ∈ N, c 2h−1,k = average f 2h−1,m , f 2h−1,m+1 , . . . , f 2h−1,m+2(λ−1) , where λ refers to the number of cross-camera groups of trajectories within a group, C(2h − 1) refers to the cluster center set when the number of cross-camera groups of trajectories within groups is 2h − 1, and k refers to the k-th cluster center. The pseudo-code of Supervised trajectory clustering algorithm considering camera information between groups (SCAIBG) is shown in Algorithm 2.

Experimental Analysis
This part briefly introduces the experimental conditions and data and the trajectory clustering results within and between camera groups. Compared with traditional clustering algorithms, the advantages of the proposed algorithm are demonstrated by silhouette coefficient evaluation, and the time complexity is analyzed to prove the algorithm's effectiveness.

Experimental Conditions and Data
The experimental data in this paper come from the CityFlowV2 dataset [50] newly launched by NVIDIA, the first large-scale dataset in the world to support cross-camera vehicle tracking, accommodating more than ten intersections and 40 cameras at the same time, with a spatial span greater than 3 km 2 . This dataset contains high-definition synchronous videos collected in Dubuque, the United States, including scenarios such as residential areas and expressways.
The experimental data in this paper are all the original dataset's video sequences. The experimental environment comprises software (Windows10, python 3.6 + sklearn0.0) and hardware (Intel (R) Core (TM) i7-10510U CPU @ 1.80 GHz 2.30 GHz, RAM 12.0 GB, and NVIDIA GeForce MX250). The algorithm to obtain the video object trajectory by preprocessing is as follows: the video dynamic object detection algorithm is Mask-RCNN [51], the tracking algorithm is TNT [52], and the cross-camera re-recognition algorithm is Deep SORT [53].

Camera Grouping of the CityFlowV2 Dataset
The CityFlowV2 dataset is divided into groups according to Equations (9) and (10). Partial division results are shown in Figure 6. In addition to the groups shown in Figure 6, Cam 1 to Cam 5 are a group, and Cam 6 to Cam 9 are a group.

Camera Grouping of the CityFlowV2 Dataset
The CityFlowV2 dataset is divided into groups according to Equations (9) and (10) Partial division results are shown in Figure 6. In addition to the groups shown in Figur  6, Cam 1 to Cam 5 are a group, and Cam 6 to Cam 9 are a group.

Determination of the Number of Cluster Centers
The number of cluster centers is determined using the elbow method [54]. Due to th continuous optimization of the cluster center and smaller quadratic sum, the first inflec tion point appearing in the change of the quadratic sum is chosen as the best K value. (32 For SCAIMG and SCAIBG, the cluster center value of each camera group is deter mined using the elbow method, as shown in Figures 7 and 8. The number of cluster center of each camera is the abscissa K corresponding to the blue box.

Determination of the Number of Cluster Centers
The number of cluster centers is determined using the elbow method [54]. Due to the continuous optimization of the cluster center and smaller quadratic sum, the first inflection point appearing in the change of the quadratic sum is chosen as the best K value.

min(SSE).
(31) For SCAIMG and SCAIBG, the cluster center value of each camera group is determined using the elbow method, as shown in Figures 7 and 8. The number of cluster centers of each camera is the abscissa K corresponding to the blue box.   According to Figures 7 and 8, we can respectively determine the reasonable number of cluster centers K according to the number of cluster centers in each group-the inflection point in the curve of average distortion degree.  According to Figures 7 and 8, we can respectively determine the reasonable number of cluster centers K according to the number of cluster centers in each group-the inflection point in the curve of average distortion degree.

Determination of the Initial Cluster Center
If the initial cluster centers for SCAIMG and SCAIBG are randomly selected, this usually results in slightly different clusters at the end when the clustering algorithm is reruned. In order to solve the above problem, we refer to the work of the other literature [55], make improvements, and propose the initial cluster center acquisition method.
Assume that there are trajectories for clustering, and the number of supervised clustering label categories is . First, a trajectory sample is selected from each categorythat is, a total of trajectory samples are selected as the initial cluster centers, and then the sample with the largest average distance from centers is selected from the remaining − samples as the + 1-th initial cluster center, and so on until K initial cluster centers are selected.

Trajectory Clustering Results within a Group
First, we demonstrate the visualization results of clustering ten camera groups with overlapping cameras in each group. For example, the visualization effect of the trajectories in group 1 is shown in Figure 9:

Determination of the Initial Cluster Center
If the initial cluster centers for SCAIMG and SCAIBG are randomly selected, this usually results in slightly different clusters at the end when the clustering algorithm is reruned. In order to solve the above problem, we refer to the work of the other literature [55], make improvements, and propose the initial cluster center acquisition method.
Assume that there are L. trajectories for clustering, and the number of supervised clustering label categories is ζ.
First, a trajectory sample is selected from each category-that is, a total of ζ trajectory samples are selected as the initial cluster centers, and then the sample with the largest average distance from ζ. centers is selected from the remaining L − ζ samples as the ζ + 1-th initial cluster center, and so on until K initial cluster centers are selected.

Trajectory Clustering Results within a Group
First, we demonstrate the visualization results of clustering ten camera groups with overlapping cameras in each group. For example, the visualization effect of the trajectories in group 1 is shown in Figure 9: The trajectories under each camera group were obtained in chronological order. The clustering results of SCAIMG in each group are shown in Figure 10. The parameters are set to ℎ = 500, = 0.01 . The red line represents the vehicle trajectory, while the green line is the cluster center. The trajectories under each camera group were obtained in chronological order. The clustering results of SCAIMG in each group are shown in Figure 10. The parameters are set to epochs = 500, η = 0.01. The red line represents the vehicle trajectory, while the green line is the cluster center. sent camera numbers, red arrows represent the main optical axis direction of the camera lens, and different color lines represent trajectories under different cameras).
The trajectories under each camera group were obtained in chronological order. The clustering results of SCAIMG in each group are shown in Figure 10. The parameters are set to ℎ = 500, = 0.01 . The red line represents the vehicle trajectory, while the green line is the cluster center. Through the visualization effect, a small number of cluster centers represent the overall trend of trajectory data of 10 camera groups.

. Trajectory Clustering Results between Camera Groups
In this paper, we performed clustering analysis on trajectory subsegments with three and five hierarchies. There are trajectory subsegments with seven, nine, and eleven or even more hierarchies. However, in these cases, there is a small number of trajectory subsegments, making the clustering analysis lose its practical significance because clustering is to extract a large number of trajectories of the overall trend.
The number of the trajectory subsegments at different hierarchies is shown in Table 2: Table 2. Number of trajectory subsegments (three and five hierarchies). Through the visualization effect, a small number of cluster centers represent the overall trend of trajectory data of 10 camera groups.

Trajectory Clustering Results between Camera Groups
In this paper, we performed clustering analysis on trajectory subsegments with three and five hierarchies. There are trajectory subsegments with seven, nine, and eleven or even more hierarchies. However, in these cases, there is a small number of trajectory subsegments, making the clustering analysis lose its practical significance because clustering is to extract a large number of trajectories of the overall trend.
The number of the trajectory subsegments at different hierarchies is shown in Table 2: The results of vehicle trajectory clustering between groups with three hierarchies (upper) and five hierarchies (lower) are shown in Figure 11. The results of vehicle trajectory clustering between groups with three hierarchies (upper) and five hierarchies (lower) are shown in Figure 11. The trajectories between groups have a larger geographical range than those within a group. It can be seen from the visualization effect that the SCAIBG has achieved the overall trend of vehicles moving between groups. The green line represents all vehicle trajectories between groups, while the arrows in other colors represent the overall trend The trajectories between groups have a larger geographical range than those within a group. It can be seen from the visualization effect that the SCAIBG has achieved the overall trend of vehicles moving between groups. The green line represents all vehicle trajectories between groups, while the arrows in other colors represent the overall trend of vehicles moving at different hierarchies.

Cluster Effect Evaluation
The silhouette coefficient [56] is used to compare and analyze the traditional DBSCANbased method [57] with the proposed algorithm to verify the effectiveness of the proposed method. The average distance between the sample and other samples in the same cluster and the average distance between the sample and the next nearest cluster are combined for evaluation: (33) Figure 12 shows the Silhouette Coe f f icient comparison of trajectory clustering within a group. Figure 11. Schematic diagram of vehicle trajectory clustering between groups (three and five hierarchies; arrows of different colors represent cluster centers).
The trajectories between groups have a larger geographical range than those within a group. It can be seen from the visualization effect that the SCAIBG has achieved the overall trend of vehicles moving between groups. The green line represents all vehicle trajectories between groups, while the arrows in other colors represent the overall trend of vehicles moving at different hierarchies.

. Cluster Effect Evaluation
The silhouette coefficient [56] is used to compare and analyze the traditional DBSCAN-based method [57] with the proposed algorithm to verify the effectiveness of the proposed method. The average distance between the sample and other samples in the same cluster and the average distance between the sample and the next nearest cluster are combined for evaluation: Figure 12 shows the ℎ comparison of trajectory clustering within a group.   For SCAIMG, the point correspondence of trajectory is considered when trajectory clustering within a group is so that the unequal-length trajectories can move reasonably according to the corresponding relation. The distance overlapping scale factor is used to offset the unreasonable distance increase between trajectories caused by camera overlap to obtain accurate trajectory centers. However, the traditional algorithm does not consider the corresponding relation between trajectories. It can only obtain the approximate distance between trajectories when measuring the trajectory similarity, so the cluster center obtained from the clustering results can not accurately represent the overall trend of trajectories. In addition, the camera number with the most capture times is used as the clustering label for supervised clustering, considering the spatial relationship between the camera and the video scenario. Figure 12 reveals that the SCAIMG is superior to the tra- For SCAIMG, the point correspondence of trajectory is considered when trajectory clustering within a group is so that the unequal-length trajectories can move reasonably according to the corresponding relation. The distance overlapping scale factor is used to offset the unreasonable distance increase between trajectories caused by camera overlap to obtain accurate trajectory centers. However, the traditional algorithm does not consider the corresponding relation between trajectories. It can only obtain the approximate distance between trajectories when measuring the trajectory similarity, so the cluster center obtained from the clustering results can not accurately represent the overall trend of trajectories. In addition, the camera number with the most capture times is used as the clustering label for supervised clustering, considering the spatial relationship between the camera and the video scenario. Figure 12 reveals that the SCAIMG is superior to the traditional trajectory clustering algorithm in clustering each camera group in large-scale vehicle trajectory data such as CityFlowV2.
For the trajectories between groups, similar to SCAIMG, SCAIBG takes group number as the cluster label for supervised clustering and considers the spatial position of camera groups and the spatial relation of video scenario, so the clustering effect is also better than that of the traditional algorithm, as shown in Figure 13.

Algorithm Time Analysis
For SCAIMG, it is assumed that there are n video objects. It takes O(n). to obtain the overlap scale factor for each trajectory. The corresponding relation and distance between each pair of trajectories can be calculated before the iteration. The corresponding relation between each pair of trajectories can be accelerated to O(n) through coarse-graining, projection, and fine-graining [58], so it takes time ≤ n·O(n) = O n 2 , in general, Therefore, the algorithm maintains linear time complexity, proving that the algorithm is effective.
Under the hardware given, the CPU time required by SCAIBG on the CityFlowV2 dataset is shown in Table 3. The algorithm can obtain clustering results in a few seconds or even one second in groups other than groups 2 and 3. The longer time in groups 2 and 3 is because some trajectories within the group are long, taking it extra time to calculate the distance. Overall, SCAIMG can quickly extract the cluster center of large-scale vehicle trajectories.
For the SCAIBG, the clustering results in λ group of prototype vectors, with obj_num video objects. For each video object obj(u), it passes through Cnum f obj(u) camera groups. Therefore, there should be u − λ − 1 subsegment of the trajectory under hierarchy u. It takes O(K t ) to randomly select K t samples. In general, if the iterations in the learning phase reach epochs or the cluster center does not change, it will take a time of Cnum f obj(t) ·K t = O(n).
Iter max , obj num , Cnum f obj(t) , and K t represent the maximum number of iterations, the number of video objects, the number of cross-camera trajectories corresponding to video objects, and the number of cluster centers, respectively. The linear time complexity of the algorithm proves that the algorithm is effective.
Under the hardware given, the CPU time required for different hierarchies of trajectories is shown in Table 4. SCAIBG can effectively simplify the calculation of regular distance by extracting the start and end points of the trajectory. Therefore, it can be seen from Table 4 that the SCAIBG performs quite fast in extracting cluster centers between groups in large-scale vehicle datasets.

Discussion
The purpose of "Hierarchical Clustering Algorithm for Multi-camera Vehicle Trajectories" proposed in this paper is to represent vehicle video objects with similar motion patterns by obtaining different levels of video object clustering centers as the representative of vehicle video object motion patterns. The method in this paper has and is not limited to the following practical applications: (1) Multi-scale vehicle path prediction. According to priori theories, we can combine time information to analyze the multi-scale prediction of vehicle moving path. (2) Urban Planning and Road Planning. Different levels of clustering results, combined with road constraints, provide effective experience and technical support for urban planning departments and traffic management departments. (3) Trajectory outliers detection. The clustering results can provide a prerequisite for vehicle trajectory outliers detection. Trajectory outliers are a small number of trajectories that are obviously different from other trajectories data in the trajectory data set, so vehicle trajectory outliers can be analyzed through the distance between the vehicle trajectories and clustering centers.
In the above applications, we think that the trajectory clustering algorithm should ensure high accuracy and efficiency. High accuracy is the premise of the accuracy of the follow-up work, and efficiency is the rigid requirement of some practical applications. For example, path prediction analysis is applied to navigation planning and trajectory anomaly detection to security departments. Rapid cluster analysis enables these applications to be implemented safely.
In this paper, the elbow method is used to determine the optimal number of clusters in clustering analysis, but this method may have some shortcomings in practical application, because the vehicle trajectory has road constraints, and there is a deviation simply according to the elbow method to determine the clustering center without taking into account the actual situation.
Therefore, in the future scientific research work, the final number of clustering centers should be determined by combining the algorithm and road constraints.

Conclusion and Prospects
This paper proposed a multi-camera vehicle trajectory hierarchical clustering algorithm based on spatio-temporal grouping, aiming at the problem that the traditional trajectory clustering algorithm did not consider the camera position information, the field of view, and the hierarchical relation of the video object movement between the camera and the scenario. This method used camera number and group number as labels and was a supervised clustering algorithm considering camera information applied to trajec-tories within and between groups, respectively, enabling trajectories to be classified into reasonable clustering centers.
The experiment showed the clustering results and the comparison and analysis of silhouette coefficients between the proposed method and other clustering methods, and the time complexity analysis proved the proposed algorithm's accuracy, effectiveness, and high efficiency. The limitation of this paper lies in the lack of trajectory clustering analysis combined with spatial semantic features, which will be improved in future research.